binary data
- Asia > China (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (2 more...)
The normalization of (almost) everything: Our minds can get used to anything, and even crises start feeling normal Science
For a long time, many climate scientists and advocates held onto an optimistic belief that once the impacts of climate change became undeniable, people and governments would act. But whereas the predictions of climate models have increasingly borne out, the assumptions about human behavior have not. Even as disasters mount, climate change remains low on voters' priority lists, and policy responses remain tepid. To me, this gap reflects a deeper failure--not just in policy or communication, but in how we understand human adaptability. When I began my career as a computational cognitive scientist, I was drawn to a defining strength of human cognition--a marked ability to adapt.
Differentiable Structure Learning for General Binary Data
Existing methods for differentiable structure learning in discrete data typically assume that the data are generated from specific structural equation models. However, these assumptions may not align with the true data-generating process, which limits the general applicability of such methods. Furthermore, current approaches often ignore the complex dependence structure inherent in discrete data and consider only linear effects. We propose a differentiable structure learning framework that is capable of capturing arbitrary dependencies among discrete variables. We show that although general discrete models are unidentifiable from purely observational data, it is possible to characterize the complete set of compatible parameters and structures. Additionally, we establish identifiability up to Markov equivalence under mild assumptions. We formulate the learning problem as a single differentiable optimization task in the most general form, thereby avoiding the unrealistic simplifications adopted by previous methods. Empirical results demonstrate that our approach effectively captures complex relationships in discrete data.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Binary and Ternary Quantization Can Enhance Feature Discrimination
Lu, Weizhi, Chen, Mingrui, Li, Weiyu
Quantization is widely applied in machine learning to reduce computational and storage costs for both data and models. Considering that classification tasks are fundamental to the field, it is crucial to investigate how quantization impacts classification performance. Traditional research has focused on quantization errors, assuming that larger errors generally lead to lower classification accuracy. However, this assumption lacks a solid theoretical foundation and often contradicts empirical observations. For example, despite introducing significant errors, $\{0,1\}$-binary and $\{0, \pm1\}$-ternary quantized data have sometimes achieved classification accuracy comparable or even superior to full-precision data. To reasonably explain this phenomenon, a more accurate evaluation of classification performance is required. To achieve this, we propose a direct analysis of the feature discrimination of quantized data, instead of focusing on quantization errors. Our analysis reveals that both binary and ternary quantization can potentially enhance, rather than degrade, the feature discrimination of the original data. This finding is supported by classification experiments conducted on both synthetic and real data.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
Archetypal Analysis for Binary Data
Wedenborg, A. Emilie J., Mørup, Morten
Archetypal analysis (AA) is a matrix decomposition method that identifies distinct patterns using convex combinations of the data points denoted archetypes with each data point in turn reconstructed as convex combinations of the archetypes. AA thereby forms a polytope representing trade-offs of the distinct aspects in the data. Most existing methods for AA are designed for continuous data and do not exploit the structure of the data distribution. In this paper, we propose two new optimization frameworks for archetypal analysis for binary data. i) A second order approximation of the AA likelihood based on the Bernoulli distribution with efficient closed-form updates using an active set procedure for learning the convex combinations defining the archetypes, and a sequential minimal optimization strategy for learning the observation specific reconstructions. ii) A Bernoulli likelihood based version of the principal convex hull analysis (PCHA) algorithm originally developed for least squares optimization. We compare these approaches with the only existing binary AA procedure relying on multiplicative updates and demonstrate their superiority on both synthetic and real binary data. Notably, the proposed optimization frameworks for AA can easily be extended to other data distributions providing generic efficient optimization frameworks for AA based on tailored likelihood functions reflecting the underlying data distribution.
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- North America > Montserrat (0.04)
- Europe > Denmark (0.04)
- Asia > India > Telangana > Hyderabad (0.04)
Tabular Data Generation using Binary Diffusion
Kinakh, Vitaliy, Voloshynovskiy, Slava
Generating synthetic tabular data is critical in machine learning, especially when real data is limited or sensitive. Traditional generative models often face challenges due to the unique characteristics of tabular data, such as mixed data types and varied distributions, and require complex preprocessing or large pretrained models. In this paper, we introduce a novel, lossless binary transformation method that converts any tabular data into fixed-size binary representations, and a corresponding new generative model called Binary Diffusion, specifically designed for binary data. Binary Diffusion leverages the simplicity of XOR operations for noise addition and removal and employs binary cross-entropy loss for training. Our approach eliminates the need for extensive preprocessing, complex noise parameter tuning, and pretraining on large datasets. We evaluate our model on several popular tabular benchmark datasets, demonstrating that Binary Diffusion outperforms existing state-of-the-art models on Travel, Adult Income, and Diabetes datasets while being significantly smaller in size.
- North America > United States > California (0.06)
- Europe > Switzerland > Geneva > Geneva (0.04)
Distributed Flexible Nonlinear Tensor Factorization §, Kai Zhang †, Pengyuan Wang ‡, Kuang-chih Lee
Tensor factorization is a powerful tool to analyse multi-way data. Recently proposed nonlinear factorization methods, although capable of capturing complex relationships, are computationally quite expensive and may suffer a severe learning bias in case of extreme data sparsity. Therefore, we propose a distributed, flexible nonlinear tensor factorization model, which avoids the expensive computations and structural restrictions of the Kronecker-product in the existing TGP formulations, allowing an arbitrary subset of tensorial entries to be selected for training. Meanwhile, we derive a tractable and tight variational evidence lower bound (ELBO) that enables highly decoupled, parallel computations and high-quality inference.
- Asia > China (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (2 more...)
Beyond Language Models: Byte Models are Digital World Simulators
Wu, Shangda, Tan, Xu, Wang, Zili, Wang, Rui, Li, Xiaobing, Sun, Maosong
Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format. Inspired by the success of next token prediction in natural language processing, we introduce bGPT, a model with next byte prediction to simulate the digital world. bGPT matches specialized models in performance across various modalities, including text, audio, and images, and offers new possibilities for predicting, simulating, and diagnosing algorithm or hardware behaviour. It has almost flawlessly replicated the process of converting symbolic music data, achieving a low error rate of 0.0011 bits per byte in converting ABC notation to MIDI format. In addition, bGPT demonstrates exceptional capabilities in simulating CPU behaviour, with an accuracy exceeding 99.99% in executing various operations. Leveraging next byte prediction, models like bGPT can directly learn from vast binary data, effectively simulating the intricate patterns of the digital world.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- Asia > China (0.04)
- (15 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Information Technology (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Bitformer: An efficient Transformer with bitwise operation-based attention for Big Data Analytics at low-cost low-precision devices
Duan, Gaoxiang, Zhang, Junkai, Zheng, Xiaoying, Zhu, Yongxin
In the current landscape of large models, the Transformer stands as a cornerstone, playing a pivotal role in shaping the trajectory of modern models. However, its application encounters challenges attributed to the substantial computational intricacies intrinsic to its attention mechanism. Moreover, its reliance on high-precision floating-point operations presents specific hurdles, particularly evident in computation-intensive scenarios such as edge computing environments. These environments, characterized by resource-constrained devices and a preference for lower precision, necessitate innovative solutions. To tackle the exacting data processing demands posed by edge devices, we introduce the Bitformer model, an inventive extension of the Transformer paradigm. Central to this innovation is a novel attention mechanism that adeptly replaces conventional floating-point matrix multiplication with bitwise operations. This strategic substitution yields dual advantages. Not only does it maintain the attention mechanism's prowess in capturing intricate long-range information dependencies, but it also orchestrates a profound reduction in the computational complexity inherent in the attention operation. The transition from an $O(n^2d)$ complexity, typical of floating-point operations, to an $O(n^2T)$ complexity characterizing bitwise operations, substantiates this advantage. Notably, in this context, the parameter $T$ remains markedly smaller than the conventional dimensionality parameter $d$. The Bitformer model in essence endeavors to reconcile the indomitable requirements of modern computing landscapes with the constraints posed by edge computing scenarios. By forging this innovative path, we bridge the gap between high-performing models and resource-scarce environments, thus unveiling a promising trajectory for further advancements in the field.
- Research Report > New Finding (0.68)
- Research Report > Promising Solution (0.48)
Unsupervised Learning of Mixtures of Multiple Causes in Binary Data
This paper presents a formulation for unsupervised learning of clus(cid:173) ters reflecting multiple causal structure in binary data. Unlike the standard mixture model, a multiple cause model accounts for ob(cid:173) served data by combining assertions from many hidden causes, each of which can pertain to varying degree to any subset of the observ(cid:173) able dimensions. A crucial issue is the mixing-function for combin(cid:173) ing beliefs from different cluster-centers in order to generate data reconstructions whose errors are minimized both during recognition and learning. We demonstrate a weakness inherent to the popular weighted sum followed by sigmoid squashing, and offer an alterna(cid:173) tive form of the nonlinearity. Results are presented demonstrating the algorithm's ability successfully to discover coherent multiple causal representat.ions of noisy test data and in images of printed characters.